Spatial Analyses
Chapter 12
As in previous weeks, for today’s lab you are to create a new
RMarkdown document using the class lab report template.
Within your .Rmd document, include all you code, resulting
plots, and answers to questions below.
When you are done with your report, use knitr to convert it to .PDF format to submit on Canvas. It is important that you document each step of your workflow using comments and that you break up the sections of your analysis into SEPARATE code chunks.
In this chapter you will learn about collection, wrangling, and visualization of spatial data.
Light Detection and Ranging (LiDAR) is an active remote sensing technique that measures vegetation height. Learn more about discrete and full waveform LIDAR and how to use LIDAR data.
After completing this tutorial, you will be able to:
Lidar or Light Detection and Ranging is an active remote sensing system that can be used to measure vegetation height across wide areas.
Scientists often need to characterize vegetation over large regions. We use tools that can estimate key characteristics over large areas because we don’t have the resources to measure each individual tree. These tools often use remote methods. Remote sensing means that we aren’t actually physically measuring things with our hands, we are using sensors which capture information about a landscape and record things that we can use to estimate conditions and characteristics.
To measure vegetation across large areas we need remote sensing methods
that can collect many measurements quickly using automated sensors.
These measurements can be used to estimate forest structure across
larger areas. Lidar (also sometimes referred to as active laser
scanning) is one remote sensing method that can be used to map structure
including vegetation height, density and other characteristics across a
region. Lidar directly measures the height and density of vegetation (as
well as buildings and other objects) on the ground making it an ideal
tool for scientists studying vegetation over large areas.
Lidar is an active remote sensing system. An active system means that the system itself generates energy - in this case light - to measure things on the ground. In a lidar system, light is emitted from a rapidly firing laser. This light travels to the ground and reflects off of things like buildings and tree branches. The reflected light energy then returns to the lidar sensor where it is recorded.
A lidar system measures the time it takes for emitted light to travel to the ground and back. That time is used to calculate distance traveled. Distance traveled is then converted to elevation. These measurements are made using the key components of a lidar system including a GPS that identifies the X,Y,Z location of the light energy and an Internal Measurement Unit (IMU) that provides the orientation of the plane in the sky.
Light energy is a collection of photons. As the photons that make up light move towards the ground, they hit objects such as branches on a tree. Some of the light reflects off of those objects and returns to the sensor. If the object is small and there are gaps surrounding it that allow light to pass through, some light continues down towards the ground. Because some photons reflect off of things like branches but others continue down towards the ground, multiple reflections may be recorded from one pulse of light.
The distribution of energy that returns to the sensor creates what we call a waveform. The amount of energy that returned to the lidar sensor is known as “intensity”. The areas where more photons or more light energy returns to the sensor create peaks in the distribution of energy. Theses peaks in the waveform often represent objects on the ground like a branch, a group of leaves or a building.
There are many different uses for lidar data:
Lidar data have also been used to derive information about vegetation structure including:
A discrete return lidar system identifies individual (discrete) peaks and records a point at each peak location in the waveform curve. These discrete or individual points are called returns. A discrete system may record 1-4 (and sometimes more) returns from each laser pulse.
A full waveform lidar system records a distribution of returned light energy. Full waveform lidar data are thus more complex to process, however, they can often capture more information compared to discrete return lidar systems.
Whether they are collected as discrete points or full waveform, most often lidar data are available as discrete points. A collection of discrete return lidar points is known as a lidar point cloud.
The commonly used file format to store lidar point cloud data is the
.las format. The .laz format is a highly
compressed version of .las and is becoming more widely
used.
Lidar data attributes can vary, depending upon how the data were collected and processed. You can determine what attributes are available for each lidar point by looking at the metadata.
All lidar data points will have:
X,Y Location Information: determines the x,y coordinate location of the object that the lidar pulse (the light) reflected off of
Z (elevation values): represents the elevation of the object that the lidar pulse reflected off of
Most lidar data points will have:
Some lidar point cloud data will also be “classified”. Classification refers to tagging each point with the object that it reflected off of. So if a pulse reflects off of a tree branch, you would assign it to the class “vegetation.” If the pulse reflects off of the ground, you would assign it to the class “ground.” Classification of lidar point clouds is an additional processing step. Classification simply represents the type of object that the laser return reflected off of. So if the light energy reflected off of a tree, it might be classified as “vegetation”. And if it reflected off of the ground, it might be classified as “ground”.
Some lidar products will be classified as “ground/non-ground.” Some datasets will be further processed to determine which points reflected off of buildings and other infrastructure. Some lidar data will be classified according to the vegetation type.
Lidar Point Clouds
The point cloud is one of the commonly found lidar
data products and is the “native” format for discrete return lidar data.
In this lesson you will explore some point cloud data using the
plas.io viewer.
We will open a .las file, in the plas.io
free online lidar data viewer. You will then explore some of the
attributes associated with a lidar data point cloud.
Remember that not all lidar data are created equally. Different lidar data may have different attributes. In this activity, you will look at data that contain both intensity values and a ground vs non ground classification.
plas.io.laz format point
cloud datasets that you will use in this lesson..laz
files into the plas.io website. window.plas.io toolbar in the “Data”
section If the data imported into the plas.io viewer
correctly, you should see something similar to the screenshot
below:
Notice that the data, upon initial view, are colored in a black - white color scheme. These colors represent the data’s intensity values. Remember that the intensity value for each lidar point represents the amount of light energy that reflected off of an object and returned to the sensor. In this case, darker colors represent LESS light energy returned. Lighter colors represent MORE light returned.
Next, scroll down through the tools in plas.io. Look for the Intensity Scaling slider. The intensity scaling slider allows you to define the thresholds of light to dark intensity values displayed in the image (similar to stretching values in an image processing software or even in photoshop).
Drag the slider back and forth. Notice that you can brighten up the data using the slider.
In addition to intensity values, these lidar data also have a classification value. Lidar data classification values are numeric, ranging from 0-20 or higher. Some common classes include:
In this case, these data are classified as either ground, or non-ground. To view the points, colored by class:
Point clouds provide a lot of information, scientifically. However, they can be difficult to work with given the size of the data and tools that are available to handle large volumns of points. Lidar data products are often created and stored in a gridded or raster data format. The raster format can be easier for many people to work with and also is supported by many different commonly used software packages.
Raster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. A raster file is a composed of regular grid of cells, all of which are the same size. You’ve looked at and used rasters before if you’ve looked at photographs or imagery in a tool like Google Earth. However, the raster files that you will work with are different from photographs in that they are spatially referenced. Each pixel represents an area of land on the ground. That area is defined by the spatial resolution of the raster.
A few notes about rasters:
A raster dataset can have attributes associated with it as well. For instance in a lidar derived digital elevation model (DEM), each cell represents an elevation value for that location on the earth. In a lidar derived intensity image, each cell represents a lidar intensity value or the amount of light energy returned to and recorded by the sensor.
One of the most basic ways to create a raster from lidar point clouds is called gridding. When you grid raster data, you calculate a value for each pixel or cell in your raster dataset using the points that are spatially located within that cell. To do this:
To work with raster data in R, you can use the
raster and sf packages.
# load libraries
library(raster)
library(sf)
library(ggplot2)
You use the raster("path-to-raster-here") function to
open a raster dataset in R. Note that you use the
plot() function to plot the data. The function argument
main = "" adds a title to the plot.
# open raster data
lidar_dem <- raster(x = "spatial/BLDR_LeeHill/pre-flood/lidar/pre_DTM.tif")
# plot raster data
plot(lidar_dem,
main = "Digital Elevation Model - Pre 2013 Flood")
If you zoom in on a small section of the raster, you can see the individual pixels that make up the raster. Each pixel has one value associated with it. In this case that value represents the elevation of ground.
Note that you are using the xlim= argument to zoom in to
on region of the raster. You can use xlim and
ylim to define the x and y axis extents for any plot.
# zoom in to one region of the raster
plot(lidar_dem,
xlim = c(473000, 473030), # define the x limits
ylim = c(4434000, 4434030), # define y limits for the plot
main = "Lidar Raster - Zoomed into one small region")
Next, let’s discuss some of the important spatial attributes associated with raster data.
Coordinate Reference System
The coordinate reference system (CRS) of a spatial
object tells R where the raster is located in geographic
space. It also tells R what method should be used to
“flatten” or project the raster in geographic space.
You will learn CRS in more detail later. For now, just
know that data from the same location but saved in different coordinate
references systems will not line up on a map. Thus, it’s important when
working with spatial data to identify the coordinate reference system
applied to the data and retain it throughout data processing and
analysis.
You can view the CRS string associated with your
R object using the crs() method. You can
assign this string to an R object too.
# view resolution units
crs(lidar_dem)
## CRS arguments:
## +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
# assign crs to an object (class) to use for reprojection and other tasks
myCRS <- crs(lidar_dem)
myCRS
## CRS arguments:
## +proj=utm +zone=13 +datum=WGS84 +units=m +no_defs +ellps=WGS84
## +towgs84=0,0,0
The CRS string for our lidar_dem object
tells us that your data are in the UTM projection.
The CRS format, returned by R, is in a
PROJ 4 format. This means that the projection information
is strung together as a series of text elements, each of which begins
with a + sign.
+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
You’ll focus on the first few components of the CRS in this activity:
+proj=utm The projection of the dataset. Your data are
in Universal Transverse Mercator (UTM).+zone=18 The UTM projection divides up the
world into zones, this element tells you which zone the data is in.
Harvard Forest is in Zone 18.+datum=WGS84 The datum was used to define the center
point of the projection. Your raster uses the WGS84 datum.+units=m This is the horizontal units
that the data are in. Your units are meters.
Important: You are working with lidar data which has a Z or vertical value as well. While the horizontal units often match the vertical units of a raster they don’t always! Be sure to check the metadata of your data to figure out the vertical units!
Next, let’s discuss spatial extent. The spatial extent of a raster or spatial object is the geographic area that the raster data covers.
The spatial extent of an R spatial object represents the
geographic “edge” or location that is the furthest north, south, east
and west. In other words, extent represents the overall
geographic coverage of the spatial object.
A raster has horizontal (x and y) resolution. This resolution
represents the area on the ground that each pixel covers. The units for
your data are in meters. In this case, your data resolution is 1 x 1.
This means that each pixel represents a 1 x 1 meter area on the ground.
You can figure out the units of this resolution using the
crs() function which you will use next.
# what is the x and y resolution for your raster data?
xres(lidar_dem)
## [1] 1
yres(lidar_dem)
## [1] 1
Resolution as a number doesn’t mean anything unless you know the
units. You can figure out the horizontal (x and y) units from the
coordinate reference system string using crs().
Notice the output contains an element called units=m.
This means the units are in meters. We won’t get into much detail about
coordinate reference strings in this class but they are important to be
familiar with when working with spatial data.
Let’s plot our raster data.
# plot raster data
plot(lidar_dem,
main = "Digital Elevation Model - Pre 2013 Flood")
We can also pull the data from the raster, such as elevation, and plot it in traditional graphs, such as a histogram. A histogram of elevations represents the distribution of pixel elevation values in your data. This plot is useful to:
# plot histogram
hist(lidar_dem,
main = "Distribution of surface elevation values",
xlab = "Elevation (meters)", ylab = "Frequency",
col = "springgreen")
Notice that you are using the xlab and ylab
arguments in your plot to label your plot axes.
Questions: 1. What does the histogram tell you?
breaks = to make larger bins. Enter the command
??hist() to read the help manual for the hist() function if
you need more information.
Vector Data
Vector data are composed of discrete geometric locations (x,y values) known as vertices that define the “shape” of the spatial object. The organization of the vertices determines the type of vector that you are working with: point, line or polygon.
Geospatial data in vector format are often stored in a shapefile format. Because the structure of points, lines, and polygons are different, each individual shapefile can only contain one vector type (all points, all lines or all polygons). You will not find a mixture of point, line and polygon objects in a single shapefile.
Objects stored in a shapefile often have a set of associated
attributes that describe the data. For example, a line
shapefile that contains the locations of streams, might contain the
associated stream name, stream “order” and other information about each
stream line object.
The shapefiles that you will import are:
R function read_sf().
read_sf() requires two components:
SJER_plot_centroids. You can call each element
separatelyread_sf("path","fileName")
Or you can simply include the entire path to the shp file in the path argument. Both ways to open a shapefile are demonstrated below:
# Import a polygon shapefile: read_sf("path","fileName")
sjer_plot_locations <- read_sf(dsn = "spatial/california/SJER/vector_data",
"SJER_plot_centroids")
When you import the SJER_plot_centroids shapefile layer
into R the readOGR() function automatically
stores information about the data. You are particularly interested in
the geospatial metadata, describing the format,
CRS, extent, and other components of the
vector data, and the attributes which describe
properties associated with each individual vector object.
Key metadata for all shapefiles include:
You can view shapefile metadata using the class, crs and extent methods:
# view just the class for the shapefile
class(sjer_plot_locations)
# view just the crs for the shapefile
crs(sjer_plot_locations)
# view just the extent for the shapefile
extent(sjer_plot_locations)
# view all metadata at same time
sjer_plot_locations
Each object in a shapefile has one or more attributes associated with it. Shapefile attributes are similar to fields or columns in a spreadsheet. Each row in the spreadsheet has a set of columns associated with it that describe the row element. In the case of a shapefile, each row represents a spatial object - for example, a road, represented as a line in a line shapefile, will have one “row” of attributes associated with it. These attributes can include different types of information that describe objects stored within a shapefile. Thus, your road, may have a name, length, number of lanes, speed limit, type of road and other attributes stored with it.
You view the attributes of a SpatialPointsDataFrame
using objectName@data (e.g.,
sjer_plot_locations@data).
Questions:
sjer_plot_locations?sjer_plot_locations have?
Let’s visualize the data in your R
spatialpointsdataframe object using
plot().
# create a plot of the shapefile
# 'pch' sets the symbol
# 'col' sets point symbol color
plot(sjer_plot_locations$geometry, col = "blue",
pch = 8)
title("SJER Plot Locations\nMadera County, CA")
A coordinate reference system (CRS) refers to the way in which spatial data that represent the earth’s surface (which is round / 3 dimensional) are flattened so that you can “Draw” them on a 2-dimensional surface. However each using a different (sometimes) mathematical approach to performing the flattening resulting in different coordinate system grids (discussed below). These approaches to flattening the data are specifically designed to optimize the accuracy of the data in terms of length and area (more on that later too).
To define the location of something you often use a coordinate system. This system consists of an X and a Y value located within a 2 (or more) -dimensional space.
While the above coordinate system is 2-dimensional, we live on a 3-dimensional earth that happens to be “round”. To define the location of objects on the Earth, which is round, you need a coordinate system that adapts to the Earth’s shape. When you make maps on paper or on a flat computer screen, you move from a 3-Dimensional space (the globe) to a 2-Dimensional space (your computer screens or a piece of paper). The components of the CRS define how the “flattening” of data that exists in a 3-D globe space. The CRS also defines the the coordinate system itself.
To define the location of something you often use a coordinate system. This system consists of an X and a Y value located within a 2 (or more) -dimensional space.
The coordinate reference system is made up of several key components:
Data Tip: spatialreference.org provides an excellent online library of CRS information.
It is important to understand the coordinate system that your data
uses - particularly if you are working with different data stored in
different coordinate systems. If you have data from the same location
that are stored in different coordinate reference systems, they will not
line up in any GIS or other program unless you have a program like
ArcGIS or QGIS that supports projection on the
fly. Even if you work in a tool that supports projection on the fly, you
will want to all of your data in the same projection for performing
analysis and processing tasks.
You can define a spatial location, such as a plot location, using an x- and a y-value - similar to your cartesian coordinate system displayed in the figure, above.
For example, the map below, generated in R with
ggplot2 shows all of the continents in the world, in a
Geographic Coordinate Reference System. The units are degrees
and the coordinate system itself is latitude and longitude with the
origin being the location where the equator meets the
central meridian on the globe (0,0).
Coding Tip: You can set your own custom theme and use it in all your ggplots. Here is one you can try out for subsequent plots.
# turn off axis elements in ggplot for better visual comparison
newTheme <- list(theme(line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(), # turn off ticks
axis.title.x = element_blank(), # turn off titles
axis.title.y = element_blank(),
legend.position = "none")) # turn off legend
Now let’s read the shape file and plot the map:
# read shapefile
worldBound <- read_sf(dsn = "global/ne_110m_land/ne_110m_land.shp")
# plot map using ggplot
worldMap <- ggplot(worldBound) +
geom_sf(fill = "black") +
labs(x = "Longitude (Degrees)",
y = "Latitude (Degrees)",
title = "Global Map - Geographic Coordinate System",
subtitle = "WGS84 Datum, Units: Degrees - Latitude / Longitude")
worldMap
You are able to add points to your map. Note that the UNITS are in decimal degrees (latitude, longitude):
Let’s create a second map with the locations overlayed on top of the continental boundary layer.
# define locations of Boulder, CO, Mallorca, Spain and Oslo, Norway
# store coordinates in a data.frame
loc_df <- data.frame(lon = c(-105.2519, 10.7500, 2.9833),
lat = c(40.0274, 59.9500, 39.6167))
# add a point to the map
mapLocations <- worldMap +
geom_point(data = loc_df,
aes(x = lon, y = lat, group = NULL), colour = "springgreen",
size = 5)
mapLocations
Questions:
In your own words, define what a Coordinate Reference System (CRS) is. What are the key components that make up a CRS?
If you are working with two datasets that are stored using difference CRSs, and want to process or plot them, what do you need to do to ensure that they line up on a map and can be processed together?
In R, how do you check to see if two datasets are in the same CRS?
We will need a few more packages for the next plots.
library(broom)
library(RColorBrewer)
library(dplyr)
# to add a north arrow and a scale bar to the map
# set factors to false
options(stringsAsFactors = FALSE)
Next, import and explore the data.
# import roads
sjer_roads <- read_sf("california/madera-county-roads/tl_2013_06039_roads.shp")
View attributes of and plot the data.
# view the original class of the TYPE column
class(sjer_roads$RTTYP)
unique(sjer_roads$RTTYP)
# quick plot using base plot
plot(sjer_roads,
main = "Quick plot of roads data")
It looks like you have some missing values in your road types. You
want to plot all road types even those that are NA. Let’s
change the roads with an RTTYP attribute of NA
to “unknown”.
# set all NA values to "unknown" so they still plot
sjer_roads$RTTYP[is.na(sjer_roads$RTTYP)] <- "Unknown"
unique(sjer_roads$RTTYP)
# plot the lines data
ggplot() +
geom_sf(data = sjer_roads) +
labs(title = "ggplot map of roads")
You can color each line by type too by adding the attribute that you
wish to use for categories or types to the
color = argument.
Below you set the colors to color = factor(RTTYP). Note
that you are coercing the attribute RTTYP to a factor. You
can think of this as temporarily grouping the data by the
RTTYP category for plotting purposes only. You aren’t
modifying the data you are just telling ggplot that the
data are categorical with explicit groups.
# plot the lines data
ggplot() +
geom_sf(data = sjer_roads, aes(color = factor(RTTYP))) +
labs(color = 'Road Types', # change the legend type
title = "Roads colored by the RTTP attribute")
Next, let’s add points to your map and and of course the map legend
too. You will import the shapefile
california/SJER/vector_data/sjer_plot_centroids.shp layer.
This data represents study plot locations from your field work in
southern California.
Let’s import that data and perform any cleanup that is required.
# import points layer
sjer_plots <- read_sf("spatial/california/SJER/vector_data",
"SJER_plot_centroids")
# given you want to plot 2 layers together, let's check the crs before going any further
crs(sjer_plots)
crs(sjer_roads)
sjer_plots <- st_transform(sjer_plots, crs(sjer_roads))s
Next, let’s plot the data using ggplot.
# plot point data using ggplot
ggplot(sjer_plots) +
stat_sf_coordinates() +
labs(title = "Plot locations")
Great! You’ve now plotted your data using ggplot. Let’s
next combine the roads with the points in one clean map.
You can layer multiple ggplot objects by adding a new
geom_ function to your plot. For the roads data, you used
geom_path() and for points you use
geom_point(). Note that you add an addition data layer to
your ggplot map using the + sign.
# plot lines and points together
ggplot() +
geom_sf(data = sjer_roads, color = "grey") +
labs(title = "ggplot map of roads") +
stat_sf_coordinates(data = sjer_plots)
Your roads layer is a much larger spatial extent compared to your
plots layer. To zoom in on your plots on the map, crop the map using
+ coord_sf(). Run ??coord_sf() for more
information
Create a map of Madera County roads with a legend:
Import the
california/madera-county-roads/tl_2013_06039_roads.shp
layer located in this week’s data download.
Create a map that shows the madera roads layer, sjer plot locations
and the sjer_aoi boundary (sjer_crop.shp).
Plot the roads so different road types are represented using unique symbology. Map the plot locations by the attribute plot type using unique symbology for each “type”.
Add a title to your plot.
Adjust your plot legend.
IMPORTANT: be sure that all of the data are within the same
extent and crs of the sjer_aoi
layer. This means that you may have to crop and reproject your data
prior to plotting it!
Great Work!
This lab activity was originally written by https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/ and modified by Jenna Ekwealor.